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Abstract 

One-class classifier (OCC) is involved for solving different kinds of problems due to its ability to represent a class distribution 
regardless the remaining classes. Its main advantage for multi-class classification is offering an open system and therefore 
allows easily extending new classes without retraining OCCs. So far, hidden Markov models, support vector machines and 
neural networks are the most used classifiers for Arabic word recognition, which provides a system with closed lexicon. In this 
paper, the OCCS are explored in order to perform an Arabic word recognition system with an open lexicon. Generally, pattern 
recognition systems designed by a single system suffer from limitations such as the lack of uniqueness and non-universality. 
Thus, combining multiple systems becomes an attractive research topic for performance and robustness enhancement. Fixed 
rules are commonly used us combiners for the hybrid OCC ensembles. The present paper aims to propose a combination 
scheme of OCCs based on the use of fuzzy integral (FI) operators. Furthermore, an alternative framework is proposed to 
design a parameter-independent and open-lexicon handwritten Arabic word recognition system as well as a new density 
measure function. Experimental results conducted on Arabic handwritten dataset using different types of OCCs with large 
number of classes highlight the superiority of FI for hybrid OCC ensembles. 


Keywords One-class classifiers - Hybrid OCC ensemble - Fuzzy integral - Density measures - Open-lexicon Arabic word 
recognition 


1 Introduction 


Handwritten Arabic word recognition is an active research 
field due to its interesting use in different applications 
such as automatic sorting of postal mail, automatic bank 
check processing, bills processing, passport validation and, 
recently, for historical document reading via text to speech 
applications, helping blinds to read and recognizing hand- 
written historical documents [1—5]. Unlike Latin languages, 
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Arabic is written from right to left. Also, it has its own 
diacritical marking such as dumma (^), hamza (+) and the 
madda (~). Regarding character shapes, Arabic script has 
two main properties. On the one hand, several letters share 
the same shape and differ only in the number and position 
of dots, such as “djim: c," "ha: c" and “kha: +.” On the 
other hand, some letters change their shape according to 
their position at the beginning, the medial or the end in the 
word. For instance, the letter "Ain" can be written through 
four shapes that are: “s,, 2, « c," where the two last shapes 
are related to end positions which change if the word is fully 
connected or not. 

So far, the Arabic word recognition is considered as one 
of the most challenging tasks of pattern recognition for its 
specific writing as well as its variability. In this context, 
the analytical and holistic approaches are the two possible 
ways for recognizing an Arabic word [6]. The first one con- 
sists of segmenting a word image into subwords or isolated 
characters, which are recognized through character recogni- 
tion. Generally, this approach is employed when a very large 
vocabulary is available since it is impossible to construct a 
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specific classifier for each word. Hence, the hidden Markov 
models (HMM) are probably the most probably used classifi- 
ers for solving very large vocabulary lexicon [7]. In contrast, 
the holistic approach is based on the global analysis where 
each word is considered as a single unit. This approach is 
appropriate for problems with large or medium vocabulary 
such as address postal recognition [8]. In this case, all kinds 
of classifiers can be used such as the binary support vec- 
tor machines (SVM) [9, 10]. The main advantage of this 
approach is related to its efficient capture of the co-artic- 
ulation and variability effects contained into word images 
handled by the same classifier [11]. 

Hence, the present work is focused on the use of the holis- 
tic approach to achieve handwritten Arabic word recogni- 
tion. In this respect, HMM [12], SVM [13], neural networks 
[14] and hybrid of SVM and conventional neural networks 
[15] are extensively used. However, such classifiers provide 
an Arabic word recognition system with closed lexicon, 
because adding new word to the lexicon requires retraining 
all the system. 

Nowadays, extended multi-class implementation to new 
classes is strongly required, for instance, in Arabic word 
recognition and handwritten writer identification. However, 
the existing classifiers need to retrain the system again on all 
classes such as the one-against-one (OAO) or one-against- 
all (OAA) implementations based on the SVM classifiers. 
Recently, one-class classifier (OCC) has been successfully 
used to achieve extensible multi-class implementations 
[16—21 ]. Indeed, extending the OCC to new classes does not 
require retraining the used classifiers for a second time. Fur- 
thermore, the OCC offers less computational cost in terms 
of training time and memory space [15, 16]. 

One-class classifiers are defined as a machine learning 
that models a restricted domain in a multi-dimensional pat- 
tern space using only a set of the target class [22]. Thus, the 
OCC is considered one among the nearest approach to the 
human learning for the classification task, due to its ability 
to learn the model of each class independently of the remain- 
ing classes. In this respect, the OCC has been successfully 
employed in many applications such as image retrieval [23], 
automated document retrieval and classification [24] as well 
as combining different biometric traits [25]. 

In this paper, the OCCs are explored in order to perform 
an open-lexicon Arabic word recognition system. Gener- 
ally, pattern recognition systems designed by a single system 
suffer from limitations such as the lack of uniqueness and 
non-universality to the problem at hand [26]. Thus, combin- 
ing multiple systems by taking advantage of each individual 
and avoiding their weakness may lead to the improvement 
of classification accuracy. Indeed, the benefits of multiple 
classifiers based on different features for the same problem 
have been suitable for various fields of pattern recognition, 
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including handwritten recognition [27], speech verification 
[28] and other applications [29]. 

Recently, it has been demonstrated that combining clas- 
sifiers can also be effective for OCCs, and therefore, the 
existing classifier combination strategies can also be applied 
to OCCs [20]. Since information regarding only one class is 
available, combining OCCs becomes more difficult. Never- 
theless, OCC ensemble has been explored to deal with vari- 
ety of applications such as anomaly detection and distributed 
intrusion detection in mobile ad-hoc networks [30], image 
annotation [17] and other recognition applications. Hence, a 
distinction should be highlighted between OCC ensemble for 
solving one-class problems, multi-class implementation and 
ensemble of multi-class implementation, which represents 
the hybrid approach. 

Usually, the combination step for the hybrid OCC ensem- 
ble is performed through the use of simple combination 
rules such as fixed rules [9, 17, 20, 31], error-correcting 
output code (ECOC) and decision template (DT) strategies 
[10, 32, 33]. However, fixed combiners fail 1n some difficult 
cases. Fixed rules are optimal for special cases in which the 
combined systems are similar in terms of performance and 
competence. Moreover, classifiers designed by various infor- 
mation sources are different from each other, because the 
members of the ensemble are built of diverse feature spaces 
[34]. Therefore, trained combiners are more suitable since a 
priori-knowledge about the ensemble is investigated in the 
test phase, favorites the more suitable classifier. Thus, the 
final decision is made by taking into account the competence 
of each member. 

In this respect, a great effort has been done for proposing 
various combination methods and schemes including meth- 
ods based on the Dezert-Smarandache theory (DSmT). For 
instance, Abbas et al. [35] proposed DSmT a new scheme 
based on the one-class support vector machine (OC-SVM) 
ensemble trained on different feature sets using the DSmT 
for handwritten digit recognition. The DSmT shows its supe- 
riority in term of performance versus the sum rule. How- 
ever, the proposed scheme violates the best advantage of 
using OCCs as multi-class system since the extension to new 
classes achieves a closed system. Indeed, adding new classes 
requires updating all parameters and retrains the combina- 
tion model. 

Other alternative combination methods have been pro- 
posed based on fuzzy sets. In that case, fuzzy integral (FI) 
and the associated fuzzy measures initially introduced by 
Sugeno have been reported to give excellent results for clas- 
sifier aggregation. Its main advantage is related to measuring 
the strength of both individual and subsets of classifiers. The 
ability of the fuzzy integral to enhance the results produced 
by multiple information sources has been proved in various 
application areas of pattern recognition [26-29]. 
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In order to achieve an open and powerful hybrid OCC 
ensemble, Hadjadji et al. [36] proposed a combination 
scheme based on the use of fuzzy integral operators by stud- 
ying their abilities against fixed rules. However, their param- 
eters should be adjusted to each time when a new class is 
added to the system. Besides, the density measure represent- 
ing the ensemble competence of each member is measured 
by its training accuracy. Consequently, all testing samples 
are represented by the same density measure values, which 
make this approach less efficient since each test sample has 
its suitable density measure. Therefore, more contribution 
should be given in respect to the more appropriate informa- 
tion source and each test sample. 

To overcome these drawbacks, this paper proposes to 
investigate an alternative framework to design a parameter- 
independent open-lexicon Arabic word recognition system 
as well as a new density measure function. This framework 
allows providing a dynamic measure for each test sample 
without the need to measure its performance via the training 
or validation dataset. Since OCCs have not been evaluated 
yet for large number of classes, experiments are carried out 
on large dataset for Arabic handwritten word recognition 
and different types of OCC, leading to an extended view on 
the usefulness of proposed framework based on FI to the 
addressed problem. 

Hence, the proposed paper tries to investigate the use of 
one-class classifiers and FI combination strategies for open- 
lexicon Handwritten Arabic word recognition. The main 
advantage of the proposed approach is to offer an open and 
efficient system, which is the first time that has been per- 
formed on handwritten Arabic word recognition. Indeed, 
the use of closed system to the application of address rec- 
ognition, as an example, is considered as a shortcoming and 
therefore open-lexicon system is suitable for updating the 
list of addresses more efficiently. 

The remaining of this paper is organized as follows. Sec- 
tion 2 reports a brief overview on hybrid OCC ensembles. 
Section 3 describes the mathematical formulation of the 
proposed hybrid OCC ensemble. In Sect. 4, experimental 
results are conducted on various types of OCC for handwrit- 
ten Arabic word recognition with large number of classes in 
order to prove the effective use of the proposed combination 
scheme. Finally, the conclusion and future work are provided 
in the last section. 


2 Brief overview on hybrid OCC ensembles 


Hybrid OCC ensemble is defined as a multi-class imple- 
mentation based on single OCC ensembles. It is designed 
for attempting to enhance the recognition performance and 
system robustness of the OCC multi-class implementation. 
As depicted in Fig. 1, the test pattern is assigned to one of 
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Fig. 1 Hybrid OCC ensemble topology 


the predefined classes according to two steps. First, a sin- 
gle OCC ensemble per class is constructed from which out- 
puts are combined through a predefined combination rule. 
Second, the outputs of multiple single OCC ensembles are 
combined through another combination rule to provide a 
multi-class decision and therefore to predict the class label. 
Note that, in addition to performance improvement, the 
hybrid OCC ensemble preserves the property of an open 
system since it is possible to add new classes without retrain- 
ing all single OCC ensembles. Hence, this scheme has firstly 
been explored by Juszczak and Duin [31] for classifying 
missing data in multi-class problems. In their proposed 
system, a single Parzen OCC ensemble is trained for each 
existing class in the training set. Each ensemble contains 
one classifier for each feature. During the classification step, 
only the available features are classified, which are com- 
bined using a fixed rule. Comparatively to their proposed 
method offers two advantages. First, it requires fewer classi- 
fiers to be trained. Second, it does not require retraining the 
system whenever missing feature values are occured. Later, 
Goh et al. [17] pointed that the SVM is influenced by the 
problems of noisy and imbalanced training data. As solu- 
tion, they proposed the use of OC-SVM to estimate support 
vectors of individual classes. Moreover, bagging scheme has 
been proposed to reduce class prediction variance. Finally, 
the overall class prediction of the multi-class implemen- 
tation is the result of majority voting of the several bags. 
Furthermore, authors explored the mentioned advantages of 
OCC for constructing dynamic ensemble which can be used 
for a new class discovery. Experimental results showed the 
effectiveness of their proposed system. In a related work, 
Mufioz-Marí et al. [9] demonstrate that using a simple com- 
bination rule (e.g., average or product) on OCCS trained on 
different feature sets improves the image classification accu- 
racy in remote sensing. They proved that the ensemble com- 
posed of the support vector data description achieves better 
results in comparison with mixture of Gaussian ensemble. 
Yeh et al. [10] proposed a combination scheme by using 
conjointly Adaboost and OC-SVM with a well-designed 
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discriminant function for solving the multi-class classifica- 
tion problems. Classifiers returned by Adaboost are then 
aggregated through the sum combination rule. In another 
work, Krawczyk and Wozniak [32] follow the path of using 
the diversity measure that has been widely explored for con- 
ventional multi-class classifiers to achieve complimentary 
systems. In their proposed system, OC-SVM ensembles 
trained on different training sets are performed. Then, a 
diversity measure is applied on the constructed ensembles 
in order to select the suitable ones for each class. Finally, the 
selected classifiers are combined using ECOC [33] and DT 
[34] strategies to provide a robust system. 

Krawczyk and Filipczuk [10] proposed an efficient 
medical decision support framework that allows distin- 
guishing between benign, malignant and fibro adenoma 
cases, based on combining OCC trained on different fea- 
tures and the ECOC combination strategy. Experimental 
evaluation shows the superiority of their proposed system 
against some state-of-the-art systems. In similar concept, 
hybrid OCC ensemble has been also investigated for medical 
image classification [37]. The ensemble consists of one-class 
KPCA models trained on different features generated from 
each image class. Besides, a new product combination rule 
was proposed. The effectiveness of their proposed classi- 
fication scheme was verified using a breast cancer biopsy 
image dataset and a 3D optical coherence tomography reti- 
nal image set. The proposed classification scheme provides 
competitive results on the two medical image sets compared 
to the state-of-the-art systems. Recently, in a related work 
Krawczyk et al. [38] presented a new method to design the 
hybrid OCC ensemble based on a single OCC ensemble. The 
main idea is to create single OCC ensemble for each class 
based on feature space partitioning. The combined classifiers 
are trained on the basis of clusters that lead to make use of 
individual classifier strengths. Experiments conducted on a 
wide range of benchmark datasets prove the validity and the 
flexibility of the proposed framework to work with differ- 
ent clustering algorithms. In order to improve their system, 
Cyganek and Krawczyk [33] proposed to split data using 
the nonnegative matrix factorization algorithm with sparse 
constraints. This framework relies on splitting the input data 
into compact and consistent clusters as well as automatic 
determination of the cluster’s number. The proposed method 
shows high accuracy and fast classification. 


3 Proposed hybrid OCC ensemble based 
on fuzzy integral 


The hybrid system is composed of different single OCC 
ensembles, where each one is dedicated to represent one class 
from the set of classes. Consequently, each class is represented 
by a combination of different OCCs. Various combination 
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rules are possible for achieving an enhanced hybrid OCC 
ensemble. The present paper proposes to investigate the fuzzy 
integral (FI) operators for combining multiple single OCC 
ensembles. 

In the following, the main concepts of the fuzzy integral 
and associated fuzzy measures are reviewed for a better clarity. 
Then, the hybrid system is described for combining multiple 
single OCC ensembles for multi-class classification by means 
of FI. 


3.1 Background on fuzzy measure and fuzzy 
integral operators 


Fuzzy integrals are nonlinear combiners defined with respect 
to fuzzy measures. Therefore, the main advantage of FI is its 
ability to combine the objective evidences in the form of expert 
decisions taking into account subjective evaluation of their 
competence expressed by the fuzzy measure. The present sec- 
tion reviews the main properties of fuzzy measures in addition 
to the used fuzzy integral operators. 


3.1.1 Fuzzy measure and objective evidence 


A measure is defined to express the importance of each infor- 
mation source and of each possible coalition of information 
sources. For a majority of applications when considering 
diverse information sources together they manifest some sort 
of positive/negative synergy. Therefore, the additive property 
of the measure may result too restrictive. In order to over- 
come this drawback, Sugeno introduced the concept of fuzzy 
measure. 

LetZ — { cse Zi} be a finite set of information sources, 
a set function g(Z) : 2“ > [0, 1]is called fuzzy measure if it 
verifies the following properties [39, 40]: 


1. g(p) = 0,22) = 1 
. If A,B c 2^ and A C B, then, g(A) < g(B) 
3. IfA,C 2% forl <n € coand the sequence { A, \ is mono- 
tone in the sense of inclusion then, 


lim g(A,) = e( im An) (1) 


These properties show that the fuzzy measure is not nec- 
essary additive; therefore, if A, B C Z, and A n B = @, then: 


g(A UB) £ g(A) + g(B) (2) 

According to this inequality, Sugeno (1977) introduced the 
decomposable so-called A-fuzzy measure satisfying the fol- 
lowing additional property for A > — I: 


&(A U B) = g(A) + g(B) + Ag(A)g(B) (3) 
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The A-value can be defined as an interaction parameter 
between two sets A and B. 

Let Z = Ln p E be the set of information sources to 
be aggregated for attempting to achieve better performance. 
In this case, each information source z; is then associated 
with a density measure g' = glizi} ), which represents 
generally the performance of a single source. In this case, 
Sugeno defines a function h : Z — [0, 1] called an objec- 
tive evidence with respect to a fuzzy measure g over Z. The 
set of objective evidences, denoted h (zi), is then rearranged 
from maximum to the minimum value as h (zi ) >... >h (z L) 
as well as their corresponding density measures g'. Conse- 
quently, according to this order, the A-fuzzy measure g (A;) of 
the new sequence A; — eae = ra can be computed recur- 
sively as follows: 


g(a) = gliz) =e (4) 


g(A;) = (Ais) +8 + Ag(Ai1)8; 

The value of A is deduced by solving the equation 
g(Z) = 1, which corresponds to resolve the following 
equation: 


2<i<L (5) 


L 


A+1= |] (1+43;) (6) 


i=] 


This value is obtained as the unique real root greater than 
—1 and not equal to zero. 


3.1.2 Fuzzy integral operators 


Sugeno integral Z, is the first FI operator defined for aggre- 
gating the objective evidence h(z;) with respect to a fuzzy 
measure g(A;) over Z. It is computed as: 


I, = max Imin(A(z;). (A;))] (7) 


This integral represents the simplest operator for FI com- 
bination. Hence, it has been extended through the definition 
of the Choquet integral. Its discrete formulation for a func- 
tion h : Z — R* with respect to fuzzy measure g is defined 
as: 


te= Y, (hla) -hlel 8) 


where indices i have been permuted so that 
0 < h(z1) <---<h(z,) < landh(z) = 0. 

Attempting to enhance more efficiently the FI, other effi- 
cient operators have been proposed such as ordered weighted 
averaging (OWA). Two commonly versions are used, 
which are OWA [39] namely OWA-AND and OWA-OR, 


respectively. The OWA-AND requires the calculation new 
evidences h (zı) defined as: 


I 
. 1- | 
h(z,) _ ( a) 2 hs) + a min {h(z)} (9) 





These new evidences are then used into Sugeno integral, 
which is termed /; 4yp. In addition, this paper proposes to 
use these new evidences into the Choquet integral termed 
Ic. Ayp 10 order to evaluate its performance. 

In contrast, the OWA-OR is performed using the same 
new evidences through the new decision function defined 
as follows: 


1-2 


log = 2L 





min (f). e(4))) + Bmax! , [min (z1) (4) 
(10) 

Both operators need tuning parameters a and f in the unit 
interval in order to achieve better results than the Sugeno 
and Choquet operators [40]. In the following, the five dif- 


ferent operators for FI are termed Sugeno (/;), Choquet (1c), 
S-AND (I, 4yp). C- AND (Ic. Ayp) and OR Cop) 


M- 


1 


l 


3.2 Hybrid OCC ensemble system 


The hybrid OCC ensemble depicted in Fig. 2 is composed 
of m classes and L different information sources. Therefore, 
each class is represented by a single OCC ensemble which 
is composed of L OCCs trained on different information 
sources. Their normalized outputs are aggregated through 
a FI operator. Finally, the class label of the test pattern is 
assigned to the single OCC ensemble that achieves the maxi- 
mum prediction. 

Let {Dpi = 1,... m) be the set of m single OCC ensem- 
bles and denote D; — {dirs j Em]. s JE as the output vec- 
tor composed of the output value d;; provided by the OCC; 
trained on the 7th information source of the jth class. The 
set of the output values can be represented in a matrix as 
follows: 


D = . = . . eee . (11) 
D,, dnt Ano ... Amt 


Several combination rules are possible to achieve the 
hybrid OCC ensemble, but all these rules need a unique 
interpretation of the outputs generated by the different clas- 
sifiers for each test pattern x. Hence, the normalization of 
outputs for each classifier is required to perform a correct 


&) Springer 


Information 
source 1 





Test pattern 


Information 
source L 


Fig. 2 Hybrid OCC ensemble scheme based on fuzzy integral 


the combination. Presently, the simple exponential function 
is used for transforming the OCC output d; ranging from 
]—oo, 0] to JO, 1] using the posteriori probability P; (c; /x) as 
follows: 


Pi(c;/x) = exp (d;(x)) (12) 


The evidence is then expressed as the posteriori prob- 
ability taking the following form: 


hi(z;) = Pi(cj/x) 


The successful key of FI depends on the appropriate 
formulation of the density measure associated to each 
information source. Consequently, if the density meas- 
ures are well formulated then the fuzzy measures can be 
correctly defined, in order to make a well aggregation of 
the fuzzy integral. Generally, the density measures repre- 
sent the competence of the ensemble member measured 
by its accurate performance. The use of the performance 
deduced from the training datasets is not well representa- 
tive since it requires a new validation dataset to get a bet- 
ter evaluation of each information source. Moreover, all 
testing samples are represented by the same values, which 
make this approach less efficient In fact, more contribu- 
tion should be given in respect to the more appropriate 
information source and each test sample. To overcome this 
limitation, we propose an alternative approach in which 
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the density measure is considered as the similarity degree 
or correspondence between the class model and that of the 
test pattern, 1.e., the closer the similarity is, the greater the 
value of the density measure is. Precisely, we introduce a 
new dynamic density measure, which is defined for each 
test sample as: 

e, = exp (—6d,(x) — dm? ) (14) 

Such that, 0 < gj (X) «landO«ócxl. 

The dm; is obtained by averaging values of the OCC; 
outputs using the training samples. 6 is a positive value 
that has been introduced for calibrating more efficiently 
the density and therefore representing more accurately 
the contribution of each classifier and information source. 
Once accurate density measures obtained, fuzzy measures 
are deduced according to Eqs. 4—6. 

Let y(x) be the class label of a test word x. A specific FI 
is applied to combine the three classifier outputs for each 
class. Finally, the test word is recognized according to the 
maximum FI obtained score over all classes as follows: 


m 
y(x) = arg max (FIOP, (^; (z;), g; (A;))), 
je 


IXiXL (5) 


FIOP is one of the defined FI operators. 
In summary, the multi-class classification using the FI 
is performed according to the Algorithm 1 
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Algorithm 1: 


Inputs: Image word J, classifier models OCC a average training outputs dmi; 


Output: Class label 
1: for i — 1 to m do 


/* i represents class variable */ 


2: for j — 1 to L do /* j represents the feature generation method variable */ 

3: Calculate the word feature vector x of the image word I 

4: Calculate the classifier output d;;(x) 

5: Compute the normalized output Pi(c; / x) and the evidence value h;(z;) according to Eq. 12 and 13. 
6: Calculate the density measure according to Eq. 14 

fi end for 

8: Determine the fuzzy measures g;(A;) , for 1 <i < Land 1 <j € m using Eqs. 4, 5 and 6. 

9: Perform FI operator to determine the combination score via the evidence and fuzzy measure 

10: end for 


11: Assign the unknown word to the word class that provides the maximum FI combination score according to 


Eq. 15. 


4 Experimental results 
4.1 Dataset description 


In order to evaluate the proposed approach on large number 
of classes, the well-known IFN/ENIT dataset [41] 1s selected 
containing more than 26,400 images of Tunisian town names 
written in Arabic script. Words are written by 411 writers 
using different writing tools. The IFN/ENIT is composed of 
four folds, A, B, C and D. Usually, results are carried out 
using three folds for training and one for testing. The present 
work is evaluated using 300 classes each one is trained by 
considering more than 10 samples per class. All results are 
reported in terms of the classification accuracy expressed in 
percentage (96). 


4.2 Information source generation 


Various techniques have been proposed for generating fea- 
tures from the word image [42]. Recently, the deep learning 
technique has been successfully used for representing hand- 
written words. Indeed, Bluche et al. [43] has that learning 
features with convolutional neural networks (CNN) is better 
than using hand-crafted features for handwritten word rec- 
ognition. In other work, Poznanski and Wolf [44] employed 
the CNN to estimate its n-gram frequency profile, which is 
the set of n-grams contained in the word. 

In this paper, the curvelet transform (CT) is used for its 
enhanced directional capacity to characterize edges and 
singularities along curves that compose handwritten Arabic 
word [44]. CT has been employed in various applications 
such as image denoising [45, 46], face and facial expression 
recognition [47, 48], compression [49], texture classification 


[50], content-based image retrieval [51], character recogni- 
tion [52, 53]. Recently, the CT has been successfully used 
for offline handwritten signature retrieval [54] and verifica- 
tion [55]. 

Presently, CT is first performed on the word image via the 
wrapping technique at different scales and different orienta- 
tions to generate curvelet coefficients. The resulting ones 
are used for computing the energies, which allow charac- 
terizing the importance of the curvature contained into the 
word image. In order to capture more efficiently the local 
information, CT is performed on different sections of the 
word image grid. Finally, the feature vector is achieved by 
concatenating all computed wedge energies for the defined 
image sections. 

For the combination stage, different information sources 
should be generated from the word image [35, 41]. Also, 
three different ways are used for partitioning an image into 
variety of sections (before applying the curvelet transform), 
namely equispaced, equimass and equimass adaptive grids, 
respectively. A uniform or equispaced grid [56] creates rec- 
tangular regions for sampling, where each one has the same 
size and shape. It is performed via placing the grid lines 
at equally spaced locations along the x-axis of the word 
image creating the vertical regions. Similarly, the horizontal 
regions are produced by placing grid lines at equally spaced 
locations down the y-axis. Conversely, an equimass grid cre- 
ates different rectangular regions having the same number 
of black pixels, also known as the mass, of the word image 
[56]. Consequently, each region is found by partitioning 
horizontally and vertically the word image using its mass 
histogram. Hence, the total mass between all adjacent points 
on either the x-axis or the y-axis are as close to equal as pos- 
sible. Additionally, the equimass adaptive is used, which is 
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Fig.3 Arabic handwritten word decomposition having the same grid size 2 x 8: a equispaced grid, b equimass grid and c equimass adaptive grid 


a small modification of the equimass grid. It is based on the 
computation along the x-axis for each horizontal region and 
not for the entire image as in equimass grid. Figure 3 shows 
an example of partitioning the handwritten word image 
using three methods. 


4.3 Experimental design 


To evaluate the hybrid OCC ensemble parameter-inde- 
pendent and open-lexicon Arabic word recognition sys- 
tem, experiments are conducted into two steps: design 
step and evaluation step. For the design step, a number of 
classes are randomly selected from the whole dataset in 
order to deduce the optimal parameters for feature genera- 
tion, training parameters for OCCs and the combination 
model. In the evaluation step, the remaining classes are 
used for evaluating the robustness of the proposed system 
taking the same parameters found during the design step. 
In other words, when a new class is added to the hybrid 
OCC ensemble for evaluation, the same parameters are 
used, as they have been tuned during the design step. 
The hybrid OCC ensemble is composed of three OCCs 
for each class; each one receives its own feature vec- 
tor according to the selected feature generation method. 
Hence, each OCC 1s separately trained on its own informa- 
tion source. In order to have a large view on the usefulness 
of the proposed architecture, experiments carried out on 
different types of OCCs which are principal component 
analysis (PCA), K-means and nearest neighbor (NN) [22]. 


Table 1 Classification accuracy 


Therefore, different systems are built according to the used 
type of OCC. These classifiers are selected for their suc- 
cess in many applications and for the reduced numbers of 
parameters to be tuned during their training. 

During the design step, the proposed open classification 
system is highly affected by the grid size and the classi- 
fier parameters. For the NN classifier, no parameters are 
required to be tuned unlike to other classifiers. For the 
K-Means, an only parameter should be carefully tuned cor- 
responding to the number of cluster. The PCA also needs 
an only parameter corresponding to the number of eigen- 
vectors. For finding the optimal parameters of the grid size 
and classifiers, 10 classes are selected randomly having 
more than 10 samples per class. To build the OCC models, 
the set of samples for each class is divided into two subsets 
namely Subset 1 and Subset 2, respectively. Subset 1 is 
used for training the OCCs, while Subset 2 is used for find- 
ing the optimal parameters of the grid size and classifiers. 

Results expressed in terms of classification accuracy for 
three information sources with different grid sizes obtained 
with the best classifier parameters are reported in Table 1. 
As can be seen, the grid size affects significantly the clas- 
sification accuracy whatever the used classifier. Indeed, 
when the grid size increases, the classification accuracy 1s 
also enhanced. Therefore, the suitable grid parameters are 
selected to achieve the best accuracy. In this case, the grid 
size 2 x 8 offers the best accuracy and optimal feature vector 
size for all classifiers and for three grid types. Therefore, this 
grid size is selected for all next experiments. 


«ous Grid size # Features PCA 

(%) of individual OCCs 

with various equispace (ES), ES 

equimass (EM) and equimass 

adaptive (EA) grid sizes 2x2 192 80.78 
2x4 384 91.89 
2x6 576 94.29 
2x8 768 95.79 
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K-means NN 
EM EA ES EM EA ES EM EA 
84.98 86.78 81.98 81.38 38223 74.17 74.17 74.40 
93.39 93.99 89.18 91.29 91.89 81.38 84.38 85.70 
95.19 95.79 02.49 0249 95.19 84.08 86.78 89.18 
96.09 96.69 93.09 93.99 96.69 93.09 93.99 96.39 
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Fig. 4 Effect of calibration 
parameter ó on the different 
operators: a PCA, b K-means 
and c NN classifiers 
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Once designing the individual classifier, the combina- 
tion is performed by means of the FI operators. As already 
mentioned, the use of FI requires a careful tuning of param- 
eters, which are the calibration parameter (6) related to the 
fuzzy densities according to Eq. 14, and the couple (a, p) 
related to OR, C-AND and S- AND operators, respectively. 
Similarly to individual systems, suitable parameter (ô, a, p) 
values are tuned using also the 10 classes. Results for the 
used operators with different values of the calibration param- 
eter (ô) ranging from 0.1 to 1 with a step 0.1 are depicted 
in Fig. 4. Moreover, for OR, C-AND and S-AND operators, 
the presented results are obtained with best parameters of the 
couple (a, f) varied in the range [0, 1]. From the presented 
results, it is worth noting that the Choquet and C-AND are 
the best and the most suitable FI operators. However, they 
are highly affected by the calibration parameter 6. Therefore, 
a careful tuning should be carried in order to achieve the best 
performance. For instance, for the NN classifier, C-AND 
operator achieves a classification accuracy from 96.09 to 
98.49% when varying the calibration parameter in the range 
[0.1, 1], which justifies its introduction to the proposed den- 
sity measure function. Table 2 reports the optimal param- 
eters (ô, a, P) selected during the design step to achieve the 
best classification accuracy. 


4.4 Combination results 


Results for the individual classifiers and the hybrid OCC 
ensembles with different combination rules according to the 
selected OCC are reported in Table 3. When comparing the 
individual classifiers, we can note that PCA is the most suit- 
able for this application. Moreover, for a fairly comparison 
and analysis of the achieved combination schemes, various 
FI operators are evaluated against fixed combiners including 


Table 2 Optimal parameters (ô, œ, fj) selected for each operator and 
OCC during the design step 


Classifier Operator 

S-AND (6, a) C-AND (ô, a.) OR (6, a, p) 
PCA (0.05, 0.3) (0.2, 0.3) (0.05, 0.2, 0.6) 
K-means (0.05, 0.5) (0.5, 0.5) (0.75, 0.3, 0.5) 
NN (0.05, 0.8) (0.35, 0.8) (0.05, 0.6, 0.3) 
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average (Avg), product (Prod), minimum (Min) and maxi- 
mum (Max) [33,55, 57]. Obtained result reveals that com- 
bining information sources allows improving considerably 
the classification accuracy than the single source for all clas- 
sifiers. For instance, when using the hybrid PCA for the test 
set A, the classification accuracy is improved by more than 
7% against the best single information source, which con- 
firm the effectiveness of using multiple systems than using 
a single one. Secondly, for the combination strategy, we can 
clearly observe that the average and product are the best 
combiners form the fixed group and on the other side, we 
find that Choquet is the best form the FI group. Besides, 
when comparing the fixed group against the FI one, we can 
note that the Choquet and its extension C-AND combiners 
offer an improved recognition rate than the best fixed aggre- 
gators. Therefore, FI combiners are more suitable than fixed 
ones for hybrid OCC ensemble, Choquet integral seems the 
most suitable for achieving the hybrid OCC ensemble, since 
it yields better results with C-AND without any parameter. 
In order to have extended view about the performance of 
the proposed system, results are provided in Table 4 with 
Table4 Classification accuracy (%) of individual classifiers and 


hybrid one-class ensembles with different combination rules for 300 
classes according to Top-N 


OCC ensemble Sources Fuzzy inte- 

gral operator 

ES EM EA Choquet 
Top-] PCA 71.14 73.32 76.26 83.73 
K-means 60.43 62.65 67.35 77.64 
NN 60.22 62.69 68.15 77.55 
Top-2 PCA 79.87 80.16 82.88 87.66 
K-means 68.67 68.96 71.22 78.00 
NN 68.15 69.12 71.08 77.62 
Top-3 PCA 87.98 | 90.19 91.97 92.60 
K-means 71.75 72.4] 74.32 80.29 
NN 71.22 . 72.32 74.85 80.16 
Top-4 PCA 90.35 91.60 093.25 95.70 
K-means 74.15 74.90 76.74 82.84 
NN 73.85 75.00 76.57 82.78 
Top-5 PCA 91.95 92.66 94.88 96.17 
K-means 75.57 76.59 78.38 83.96 
NN 75.48 76.80 78.67 83.73 


The bold result defines the best classifer 


Table 3 Classification accuracy (%) of individual classifiers and hybrid one-class ensembles with different combination rules for 300 classes 


OCC ensemble Sources Fixed rules Fuzzy integral operators 

ES EM EA Avg Prod Max Min Sugeno C-AND OR Choquet S-AND 
PCA 71.14 73.32 76.26 82.02 82.02 71.11 ai 79.91 81.86 81.84 83.73 78.18 
K-means 60.43 62.65 67.35 76.43 76.32 62.61 69.07 70.51 15.12 71.43 77.64 71.88 
NN 60.22 62.69 68.15 76.26 76.21 62.64 69.75 72.45 75.67 13:29 71.55 13.22 
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different types for each sort of used OCC to perform the 
open hybrid OCC ensembles. From the obtained results, we 
can obviously notice the effect of the selected top for the 
word recognition. Furthermore, the OC-PCA-based open 
hybrid OCC ensemble appears to perform the best word 
recognition accuracy, and provides 83.73 and 96.17% for 
Top-1 and Top-5, respectively. 

In order to show the effective use of the Choquet operator 
against the remaining combination rules, we use the McNe- 
mar's test [58], which allows comparing statistically two 
systems. More precisely, a contingency table is constructed 
in order to calculate the p value. 

The McNemar's test has the ability to provide whether 
one system is significantly better than another according to 
the p value. A small p value indicates a significant difference 
of the classification accuracy between two systems to be 
compared. In contrast, when the p value exceeds 0.05 then 
the null hypothesis is considered. In this case, both systems 
perform closely and the difference is too small to decide the 
superiority of one system than the other. 

Table 5 reports the p value of the Choquet operator 
against the remaining combination rules for PCA, K-means 
and NN classifiers according to Top-1. The obtained results 
show that the Choquet operator is significantly different from 
the other combination rules for the different type of classi- 
fiers except for C-AND operator, where small difference can 
be observed in some cases for which p values exceed 0.05. 

McNemar's test evinces that Choquet and C-AND are 
more robust than the fixed rules whatever the used OCC and 
datasets. Indeed, the obtained p values are very small which 
shows the high difference between the different aggregators. 

These findings demonstrate the success of Choquet opera- 
tor for improving the results of the hybrid OCC ensemble, 
which is due to its ability to capture interactions among the 
OCCs and attribute the right importance for each informa- 
tion source. According to the definition of Choquet FI, the 
appropriate weight values are dynamically deduced from the 
fuzzy measure dynamically for each test pattern which rep- 
resents its successful key. Moreover, the proposed method 
for dynamic density measure, which is associated with fuzzy 
measure, offers a better adaptation allowing to give more 
importance to the relevant information source relatively to 
the others. 


Table5 The p values of 
McNemar's test for Choquet FI 


OCC ensemble Fixed rules 


versus the other combination Avg Prod 


rules for hybrid OCC ensembles 


4.5 Stability assessment of combination rules 


The proposed hybrid OCC ensembles using FI operators are 
evaluated to show the behavior of the parameter-independ- 
ent open classification when new classes are progressively 
added to the system. Also, we use the stability criterion, 
which defines the ability of an OCC or hybrid OCC ensem- 
ble to maintain the same classification accuracy when the 
number of classes is progressively increased in the system. 

Hence, the classification accuracy is computed for the 
hybrid OCC ensemble by adding progressively new classes 
each time from 10 to 300 using parameters found in the 
design step. Figure 5 depicts the classification accuracy ver- 
sus the number of classes achieved by the Choquet, C-AND 
and Sugeno operators against best fixed rules average and 
product to perform the hybrid OCC ensembles. Roughly 
speaking, we clearly can notice that the most combiners 
achieve similar classification accuracy for few classes. How- 
ever, when adding progressively new classes, the used com- 
bination methods behave differently for more complicated 
problem. When comparing the set of aggregators, the Cho- 
quet is the most stable and widely better than the fixed com- 
biners since they keep the same classification accuracy while 
adding new classes. Indeed, the gap between the best and 
the worst aggregators increases progressively when extend- 
ing the hybrid OCC ensemble for more complex problem 
via adding new classes. Consequently, the Choquet opera- 
tor provides the best performance and are the least affected 
when adding new classes. Conversely, surprising results are 
achieved by Sugeno FI, which is highly affected by the num- 
ber of classes and shows its inappropriate use for combining 
hybrid OCC ensembles. 


4.6 Discussions 


Nowadays, extended multi-class implementation to new 
classes 1s strongly required, for instance, in Arabic word 
recognition and handwritten writer identification. Indeed, 
the use of OCCs for solving the multi-class classification 
problem has been discussed by many researchers due to its 
offered properties. This type of classifier attempts to model 
each class separately from the others that allows designing 
an open multi-class system. This property is highly desir- 
able for the actual systems since it is possible to add new 
classes without retraining the classification system again 


Fuzzy integral operators 


Max Min C-AND OR Sugeno S-AND 


according to Top-1 PCA 82810. ^ 22810 P 1.171078 < 10716 «1076 Á c10796 — «107/65 < 10716 
NN 1.50107 2.161078  Àa«10-7/6 < 107!6 3.5010? 9.74107!ł4 < 10-7!6 «10-16 
K-means 1.77x1079 2.001 x10 «1076 «107/65 42810? < 107!6 « 10716 « 10-16 
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Table 6 Classification accuracy 
(96) achieved by the proposed 
open-lexicon handwritten word 
recognition based on OCC 
comparatively to other state- 
of-the-art systems use SVM 
classifier 


Reference 


Nemmour and Chibani [59] 
Khalifa and Bingru [60] 
Alalshekmubarak et al. [61] 


Proposed system 


on the whole classes. However, using individual OCC does 
not allow designing a robust classification system. There- 
fore, the design of hybrid OCC ensemble is required in 
order to achieve the best possible classification perfor- 
mance and robustness as well as keeping the properties 
of OCC. Thus, this paper discusses the appropriate use of 
the FI for best designing of the open-lexicon handwritten 
Arabic word recognition system. Various works reported 
that FI provides excellent results for classifier aggregating. 
However, when adapting FI for the hybrid OCC ensemble, 
a problem is faced which 1s the generation of the density 
measure. Indeed, it is usually estimated using the train- 
ing datasets. However, this estimation is considered not 
representative and leads to require new validation dataset 
to get a better evaluation of each information source. Fur- 
thermore, all testing samples are represented by the same 
density values, which make this approach less efficient 
since each test sample has its suitable density measure. 
Therefore, more contribution should be provided in respect 
to the more appropriate information source and each test 
sample. Thereby, a dynamic density measure is proposed 
having the ability to attribute the appropriate values to 
each information source as well as its adaptation for each 
test pattern. 

In order to design a parameter-independent system, a 
new framework is proposed that relies on using a separated 
datasets for tuning parameters and the selection of optimal 
parameter values. This is an efficient property since, once 
the parameters are found, they are considered to be the 
same for all existing classes and also for the new ones. 

In this step, we have seen the effect of all parameters 
including descriptor, classifier, density measure and OWA 
operators. From the set of parameters, we notably mention 
the impact of the calibration parameter used in the den- 
sity measure to enhance the combination performance. It 
can be also noticed that the calibration parameter is more 
suitable for the Choquet and C-AND operators. From 
all experimental results we clearly notice the achieved 
improvement by the use of the FI combination scheme 
based on Choquet operators against the other operators 
and fixed rules. More precisely, the Choquet FI shows its 
stability to maintain the classification accuracy roughly 


#Classes Features Classifier Accuracy (%) 
24 Ridgelet transform SVM 84.00 
56 SIFT SVM 91.70 
24 Uniform grid SVM 95.27 
56 92.37 
24 Curvelet OC-PCA 98.21 
56 96.70 
300 83.73 


stable when adding new words to the lexicon against the 
other combination rules. 

In order to situate the proposed open-lexicon handwrit- 
ten word recognition system based on OCC, a comparative 
study has been performed versus some recent studies those 
have explored the SVM classifier for the IFN/ENIT dataset. 

Table 6 reports the classification accuracy achieved by the 
open-lexicon system versus other systems using SVM classi- 
fier. The results obtained reveal that the open-lexicon system 
based on the OC-PCA offer better results against the other 
systems use SVM classifier in terms of classification accu- 
racy, as well as the advantage of keeping the system open 
for new class words. Moreover, we should also highlight the 
impact of FI for improving the results and its good adapta- 
tion with the OCC to keep the concept of open lexicon. This 
would suggest that using OC-PCA may be more representa- 
tive for Arabic words than the SVM classifier [59—61]. 


5 Conclusion 


This paper has investigated the usefulness of combining 
OCCs for open-lexicon handwritten Arabic word recog- 
nition. The main advantage of the proposed approach is 
related to an offered open system, which is the first time 
that has been applied for handwritten Arabic word recog- 
nition. Indeed, for instance, the use of closed system for 
address recognition can be considered as an inconvenient 
and therefore an open-lexicon system is suitable for updating 
the list of addresses more efficiently. Furthermore, due to 
the fact that fixed rules are the standard used combiners for 
the hybrid OCC ensemble, the proposed work attempted to 
study the potential of FI operators via proposing a combina- 
tion scheme for combining ensemble of OCCs designed by 
different feature generation methods. 

Experimental analysis is conducted on different types 
of OCC and Arabic handwritten word datasets having high 
number of classes. The results prove the superiority of FI 
against fixed combiners those represent our base of compari- 
son, whatever the selected type of OCC. Furthermore, the 
Choquet operator seems to be the most suitable and powerful 
among FI aggregators. Thus, this study suggests keeping 
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fuzzy integral operators as a way for achieving robust hybrid 
OCC ensembles for performing open-lexicon handwritten 
Arabic word recognition. 

For future work, we plan to propose a new architecture of 
the hybrid OCC ensemble, which relies on the dissimilarity 
learning approach in order to generate a generic model to 
perform the open-lexicon classification system. 
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